Ventral striatum and orbitofrontal cortex are both required for model-based, but not model-free, reinforcement learning.
نویسندگان
چکیده
In many cases, learning is thought to be driven by differences between the value of rewards we expect and rewards we actually receive. Yet learning can also occur when the identity of the reward we receive is not as expected, even if its value remains unchanged. Learning from changes in reward identity implies access to an internal model of the environment, from which information about the identity of the expected reward can be derived. As a result, such learning is not easily accounted for by model-free reinforcement learning theories such as temporal difference reinforcement learning (TDRL), which predicate learning on changes in reward value, but not identity. Here, we used unblocking procedures to assess learning driven by value- versus identity-based prediction errors. Rats were trained to associate distinct visual cues with different food quantities and identities. These cues were subsequently presented in compound with novel auditory cues and the reward quantity or identity was selectively changed. Unblocking was assessed by presenting the auditory cues alone in a probe test. Consistent with neural implementations of TDRL models, we found that the ventral striatum was necessary for learning in response to changes in reward value. However, this area, along with orbitofrontal cortex, was also required for learning driven by changes in reward identity. This observation requires that existing models of TDRL in the ventral striatum be modified to include information about the specific features of expected outcomes derived from model-based representations, and that the role of orbitofrontal cortex in these models be clearly delineated.
منابع مشابه
What is the role of orbitofrontal cortex in dopamine dependent reinforcement learning ? !
Orbitofrontal cortex (OFC) has been implicated in signalling reward expectancies, but its exact role , and how this differs from that the role of ventral striatum (VS), is an open question. One idea is that VS is the seat of value learning in model-free, dopamine-dependent reinforcement learning, while OFC represents values in dopamine-independent model-based learning. However, recent results [...
متن کاملFronto-striatal organization: Defining functional and microstructural substrates of behavioural flexibility
Discrete yet overlapping frontal-striatal circuits mediate broadly dissociable cognitive and behavioural processes. Using a recently developed multi-echo resting-state functional MRI (magnetic resonance imaging) sequence with greatly enhanced signal compared to noise ratios, we map frontal cortical functional projections to the striatum and striatal projections through the direct and indirect b...
متن کاملThe involvement of model-based but not model-free learning signals during observational reward learning in the absence of choice.
A major open question is whether computational strategies thought to be used during experiential learning, specifically model-based and model-free reinforcement learning, also support observational learning. Furthermore, the question of how observational learning occurs when observers must learn about the value of options from observing outcomes in the absence of choice has not been addressed. ...
متن کاملReinforcement learning models and their neural correlates: An activation likelihood estimation meta-analysis.
Reinforcement learning describes motivated behavior in terms of two abstract signals. The representation of discrepancies between expected and actual rewards/punishments-prediction error-is thought to update the expected value of actions and predictive stimuli. Electrophysiological and lesion studies have suggested that mesostriatal prediction error signals control behavior through synaptic mod...
متن کاملStates versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning
Reinforcement learning (RL) uses sequential experience with situations ("states") and outcomes to assess actions. Whereas model-free RL uses this experience directly, in the form of a reward prediction error (RPE), model-based RL uses it indirectly, building a model of the state transition and outcome structure of the environment, and evaluating actions by searching this model. A state predicti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- The Journal of neuroscience : the official journal of the Society for Neuroscience
دوره 31 7 شماره
صفحات -
تاریخ انتشار 2011